ParaPhraser: Russian Paraphrase Corpus and Shared Task
نویسندگان
چکیده
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with finegrained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.
منابع مشابه
Sentence Paraphrase Graphs: Classification Based on Predictive Models or Annotators' Decisions?
As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predict...
متن کاملParaphrasing of Chinese Utterances
One of the key issues in spoken language translation is how to deal with unrestricted expressions in spontaneous utterances. This research is centered on the development of a Chinese paraphraser that automatically paraphrases utterances prior to transfer in Chinese-Japanese spoken language translation. In this paper, a pattern-based approach to paraphrasing is proposed for which only morphologi...
متن کاملCUSAT_NLP@DPIL-FIRE2016: Malayalam Paraphrase Detection
This paper describes an approach for paraphrase detection in Malayalam sentences developed as part of FIRE 2016 Shared Task on Paraphrase detection in Indian Languages. The task of paraphrasedetection is finding a sentence with the same meaning of another sentence expressed using same or different words. This detection is done by a semantic approach which is language dependent. Individual words...
متن کاملDPIL@FIRE2016: Overview of the Shared task on Detecting Paraphrases in Indian language
This paper explains the overview of the shared task "Detecting Paraphrases in Indian Languages" (DPIL) conducted at FIRE 2016. Given a pair of sentences in the same language, participants are asked to detect the semantic equivalence between the sentences. The shared task is proposed for four Indian languages namely Tamil, Malayalam, Hindi and Punjabi. The dataset created for the shared task has...
متن کاملSemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
In this shared task, we present evaluations on two related tasks Paraphrase Identification (PI) and Semantic Textual Similarity (SS) systems for the Twitter data. Given a pair of sentences, participants are asked to produce a binary yes/no judgement or a graded score to measure their semantic equivalence. The task features a newly constructed Twitter Paraphrase Corpus that contains 18,762 sente...
متن کامل